Infant Category Learning 1 Modeling Age Differences in Infant Category Learning
نویسندگان
چکیده
We used an encoder version of cascade-correlation to simulate Younger and Cohen’s finding that 10-month-olds recover attention on the basis of correlations among stimulus features, but 4and 7-month-olds recover attention on the basis of stimulus features. We captured these effects by varying the score-threshold parameter in cascadecorrelation, which controls how deeply training patterns are learned. When networks learned deeply, they showed more error to uncorrelated than to correlated test patterns, indicating that they abstracted correlations during familiarization. When prevented from learning deeply, networks decreased error during familiarization, and showed as much error to correlated as to uncorrelated tests, but less than to test items with novel features, indicating that they learned features but not correlations among features. Our explanation is that older infants learn more from the same exposure than do younger infants. Unlike previous explanations that postulate unspecified qualitative shifts in processing with age, our explanation focuses on quantitatively deeper learning with increasing age. Finally, we provide some new empirical evidence to support this explanation. Infant Category Learning 3 Modeling Age Differences in Infant Category Learning Using a stimulus-familiarization-and-recovery procedure to investigate infant categorization Younger and Cohen (1983, 1986) found that young infants tend to process information about independent features of visual stimuli, whereas older infants are able to abstract relations among those features. These findings on category learning are relevant to a longstanding controversy about the extent to which perceptual development involves integration (Hebb, 1949) or differentiation (Gibson, 1969) of stimulus information. A developing ability to understand relations among features suggests that perceptual development involves information integration. In fact, Cohen and colleagues (Cohen, 1991; 1998; Cohen, Chaput, & Cashon, 2002) have noted the generality of this integration process; citing examples that range from infants' perception of an angle (Cohen & Younger, 1984) to their understanding of complex causal events (Cohen, Rundell, Spellman, & Cashon (1999). Infants are well known to decrease their attention to repeated stimuli and to recover their attention to novel stimuli, but not to familiar stimuli. Recovery of attention to a novel test stimulus suggests its novelty; little or no recovery indicates familiarity. When applied to infant category learning, this pattern of decrease and selective recovery of attention presumably signals the gradual construction of a conceptual structure for a category of stimuli and the discrimination of novel stimuli fitting that category from novel stimuli not fitting the category. Infants are assumed to construct representational categories for repeated stimuli, ignoring novel stimuli that are consistent with the category, while concentrating on those novel stimuli that are not members of the Infant Category Learning 4 category. See Cohen and Younger (1983) or Younger and Cohen (1985) for reviews of this literature. There also is evidence that the use of correlated features in category formation becomes increasingly important with development. After repeated presentation of visual stimuli with correlated features, young infants recover attention to stimuli with novel features more than to stimuli with either correlated or uncorrelated familiar features (Younger and Cohen, 1983, 1986). In contrast, older infants recover attention to both stimuli with novel features and familiar uncorrelated features more than to stimuli with familiar correlated features. This pattern of recovery of attention suggests that young infants have learned about the individual stimulus features, but not about the relationship among features, whereas older infants have also learned about how these features are correlated with one other. Here we report on artificial neural network simulations of these infant experiments in an attempt to better understand the nature of age differences in category learning. In subsequent sections of the paper, we describe the infant experiments in greater detail, outline the simulation techniques, report and discuss our results, and mention some new empirical findings relevant to the simulations. The Infant Experiments We focus first on the most important finding, the interaction between age and preference for correlated and uncorrelated test stimuli. We then cover additional, related findings on discrimination of a correlated test stimulus and preference for novel feature values. Age x Test-stimulus Interaction: Infant Category Learning 5 The structure of the stimuli used in Younger and Cohen’s Experiment 2 (1986) is shown in Table 1. There were four familiarization stimuli that were repeatedly presented to the infants, labeled as stimuli 1-4 in Table 1. The stimuli were line drawings of imaginary animals, characterized by three features (labeled A-C) with three values (labeled 1-3) each. The body could be that of a giraffe, cow, or elephant. The tail could be feathered, fluffy, or a horse's tail. And the feet could be webbed, hoofed, or club feet. Two sets of stimuli (labeled 1 or 2) were used. In each set, features A and B were perfectly correlated across stimuli. Feature C was uncorrelated with features A and B. -----------------------------Insert Table 1 about here -----------------------------The participants were 24 infants at each of three ages: 4, 7, and 10 months. Each infant received twelve, 20-second familiarization trials to one of the two sets of animals, each being presented three times. Following the familiarization procedure each infant received 20-second presentations of each of the three test stimuli, characterized in the bottom section of Table 1. The correlated test stimulus preserved both the feature values (labeled 1 and 2) and correlation pattern of the familiarization stimuli. The uncorrelated test stimulus used the same feature values but broke the pattern of correlation between features A and B. Finally, the novel test stimulus employed novel feature values (labeled 3). Mean fixation times per trial to the three test stimuli by 4and 10-month-old infants are presented in Figure 1. As noted, 4-month-olds recovered attention more to the novel stimulus than to the correlated and uncorrelated stimuli, whereas 10-month-olds Infant Category Learning 6 recovered attention more to the uncorrelated and novel stimuli than to the correlated stimulus. Thus, 4-month-olds seem to have learned only about the feature values, whereas 10-month-olds also learned about the pattern of correlations between features during the familiarization procedure. Curiously, 7-month-olds failed to habituate their attention during the familiarization phase, so their recovery data are not easily interpretable. It is as if they failed to learn anything about the stimuli. ------------------------------Insert Figure 1 about here ------------------------------Explaining the Age x Test-stimulus Interaction: Experiment 3 (Younger & Cohen, 1986) tested two different explanations of the interaction between age and test stimulus: increasing sensitivity to correlations versus similarity between habituation and test stimuli. In Experiment 2, the correlated test item was most similar to those in the familiarization set (indeed it was identical to stimulus 4), followed by the uncorrelated test item, and then the novel test item. It is thus possible that age differences in recovery reflect a lower novelty threshold for older infants than for younger infants. Older infants may have responded more to the uncorrelated and novel test items because these stimuli were less similar to the familiarization items than was the correlated test stimulus. Younger infants, in contrast, may have responded to both correlated and uncorrelated items as familiar because neither item reached its novelty threshold. On the other hand, it might have been as previously stated, that the older infants were sensitive to correlation information, and the younger infants were not. Infant Category Learning 7 These two explanations were tested in Experiment 3 by removing the correlated test item from the familiarization set. This ensured that the uncorrelated test item was now most similar to the familiarization set, followed by the correlated test item, and finally by the novel test item. Eighteen infants were tested at each of two ages: 7 and 10 months, 9 infants per condition. Each infant received nine 20-second familiarization trials in which each stimulus was presented three times. Following the familiarization phase, each test stimulus was presented once for 20 seconds. Again the 7-month-olds failed to habituate, thus rendering their recovery data difficult to interpret. The 10-month-olds showed the same recovery pattern as in Experiment 2, as shown in Figure 2. If recovery was based on similarity, they should have recovered attention more to the correlated test item than to uncorrelated test item. But if recovery was based on sensitivity to correlations, they should have recovered attention more to the uncorrelated test item than to the correlated test item, which they in fact did. We recently ran 11, 4-month-olds on essentially the same task as Experiment 3, (Cohen & Arthur, unpublished). The 4-month-olds' data are also presented in Figure 2 for comparison. As can be seen in the figure, 4-month-olds once again produced the pattern characteristic of responding to independent features rather than the correlation among features. ------------------------------Insert Figure 2 about here ------------------------------Infant Category Learning 8 We take the interaction between age and test stimulus shown in both Figures 1 and 2 to be the main results to cover in a computer simulation. Similar interactions were reported in Experiment 1 of Younger and Cohen (1983), an experiment using five features, three of which were correlated in the familiarization phase. For simulation purposes, we prefer the design of Experiment 3 (1986) because of the extra control provided by removing the correlated test stimulus from the familiarization set. It is a foregone conclusion that artificial neural networks would perform better with trained than with novel items. Discrimination of a Correlated Test Stimulus: It was also reported that infants discriminated a correlated test stimulus from a similar stimulus in the familiarization set when both stimuli possessed two additional uncorrelated features, namely kind of ears and number of legs (Younger & Cohen, 1983, Experiment 2). This became of interest in order to differentiate two possible explanations for 10-month-olds’ preference for an uncorrelated test stimulus (Younger & Cohen, 1983, Experiment 1) over a correlated test stimulus. This preference could have been based on correlation detection or on failure to discriminate the correlated test stimulus from the familiarization stimuli. The infants in this experiment were 20, 10-month-olds. They received four 20second presentations of one of the four familiarization stimuli in Table 2 (a, b, c, or d), and were then tested twice for 20 seconds each on this familiar stimulus and the test stimulus shown in the third row and same column of Table 1 (the correlated test stimulus from familiarization set 1 or 2 of Experiment 1, 1983). Mean fixation time per trial was significantly less to the familiar stimulus (7.31s) than to the correlated test stimulus Infant Category Learning 9 (11.34 s). This result confirmed that the 10-month-old infants did distinguish between familiarization and correlated test stimuli, and suggests that the 10-month-olds’ preference for an uncorrelated test stimulus in Experiment 1 (1983) was indeed based on their ability to detect inter-feature correlations in familiarization stimuli. Preference for Novel Feature Values: In these experiments it was common for infants to show more recovery of attention to test stimuli with novel feature values than to test stimuli with familiar feature values, whether these test stimuli preserved correlations found within the familiarization set or not (Younger & Cohen, 1983, Experiments 1 and 3, and 1986, experiments 1-4). This was particularly true for 4-month-olds. Ten-month-olds also looked longer at novel test stimuli than at correlated test stimuli, but they seemed to look about equally at the novel and uncorrelated test stimuli. One can question the generality of this last finding, however. In a recent study (Cohen & Arthur, unpublished), which we shall describe shortly, our 10-month-old infants looked longer at the novel test stimulus than at the uncorrelated test stimulus. Simulation Algorithms One of the popular techniques for simulating familiarization experiments with infants uses so-called encoder networks (Mareschal & French, 2000; Mareschal, French, & Quinn, 2000; Shultz & Bale, 2001). Encoder networks have the same number of input units as output units, and their primary task is to reproduce their input values onto their output units. They do this by encoding the input representation onto a set of hidden units and then decoding that hidden-unit representation onto the output units. Because the teaching signal is contained in the stimulus input and network error is the squared Infant Category Learning 10 difference between inputs and outputs, this is a case of unsupervised learning about stimuli from mere stimulus exposure. Encoder networks essentially develop a recognition memory for the stimuli they are exposed to. Network error can be used as an index of stimulus novelty. Encoder networks can be implemented within a variety of neural network learning algorithms, such as back-propagation and cascade-correlation. We favor the cascade-correlation algorithm, which has been successfully applied to a large number of phenomena in cognitive development (Shultz, 2003). Later we compare our cascade-correlation results to simulations using the back-propagation learning algorithm. Both cascade-correlation and back-propagation algorithms implement learning in so-called feed-forward networks, in which neural activation is propagated in a forward direction from input units to hidden units and on to output units. As well, both algorithms learn by adjusting connection weights between units in order to reduce network error. Back-propagation is normally used in static networks that do not change network topology once it has been designed by the programmer. In contrast, cascadecorrelation networks grow as well as learn, by recruiting new hidden units into the network as needed. Thus, cascade-correlation networks are able to simulate the increases in computational and representational power that have been featured in constructivist accounts of development (Mareschal & Shultz, 1996). The process of network growth in cascade-correlation may correspond to synaptogenesis and/or neurogenesis in the brain (Quartz & Sejnowski, 1997). An encoder option in cascade-correlation freezes direct input-output connections at 0 to prevent trivial solutions in which weights of 1 are learned between each input and its corresponding output. Encoding of Stimuli for Simulation Infant Category Learning 11 The stimulus features in the Younger and Cohen experiments have three different values that we encode on two binary input units. Binary coding makes sense here because the stimulus features differ qualitatively, rather than quantitatively. Even a feature like number of legs may be qualitatively represented by infants, as in biped versus quadruped. Our scheme for converting integers to binary values is shown in Table 3. The values of 0.5 and 0.5 are the target values that cascade-correlation uses for binary output units, so we use them instead of the more conventional binary values of 0 and 1. For each feature in each network, we randomly selected three out of the four possible binary codes, by generating three random integers without replacement between 0 and 3 inclusive. For example, if the Younger and Cohen values of 1, 2, and 3 in Table 1 are randomly paired with 2, 0, and 3 respectively in Table 3, then a stimulus coded by Younger and Cohen as (1 1 2) would be coded as (.5 -.5 .5 -.5 -.5 -.5) for the simulation. Because of different random assignments for each network, another network may code the same stimulus differently. This ensures that overall simulation results cannot be influenced by particular code assignments. With any spurious correlations between features removed by random assignment, any correlations detected by the networks would be based solely on co-occurrence of the stimulus feature values, not confounds with a particular coding assignment. Also, just like in the infant experiments, the novel test stimulus has the same three features as the training stimuli, but with novel values. Cascade-Correlation Parameter Settings We implemented age differences by varying the score-threshold parameter in cascade-correlation networks. This parameter controls how deeply training patterns are learned. Training stops when all outputs for all training patterns are within scoreInfant Category Learning 12 threshold of their targets. Previously, we had used variations in score-threshold to simulate age differences in discrimination shift learning in older children (Sirois & Shultz, 1998). The assumption is that older children (implemented with a lower scorethreshold) learn more from the same stimulus exposure time than younger children do. Nine cascade-correlation networks were run at each of two levels of score-threshold (0.25 and 0.15) with each of the two stimulus sets described earlier for a total of 36 networks. Each network develops a somewhat unique solution because it begins with randomly chosen connection weights and uses a random assignment of stimulus coding values. Output and input limits govern the maximum number of epochs in the output and input phases of cascade-correlation learning. In output phases, connection weights leading to output units are adjusted in order to reduce network error. In input phases, connection weights leading to candidate hidden units are adjusted in order to maximize a correlation between candidate hidden unit activation and network error. An epoch is a single pass through the training patterns. Output and input limits were set to the default values of 100 epochs. Output and input patience parameters govern the maximum number of epochs to continue within a phase without significant change in network error (for output phases) or correlation with network error (for input phases). When this number of epochs goes by without a significant reduction in error, the algorithm changes phase, either from output to input phase (to recruit a new hidden unit) or from input phase to output phase (to determine how to best use a new hidden unit). These patience values were set to 1 as in previous infant habituation simulations (Shultz & Bale, 2001). All other cascadecorrelation parameters were left at default values. Infant Category Learning 13 Cascade-Correlation Results Age x Test-item Interaction Networks run at a score-threshold of 0.25 required a mean of 56.2 epochs and recruited a mean of 3.1 hidden units. Those run at a score-threshold of 0.15 required a mean of 59.1 epochs and recruited a mean of 3.3 hidden units. Mean network error to the correlated and uncorrelated test stimuli is plotted in Figure 3 for the two levels of score-threshold. This was the key interaction to be examined. Dependent t-tests revealed no effect of test stimulus at the higher scorethreshold value of 0.25, t(8) = .735, ns for familiarization set 1, and t(8) = .814, ns for familiarization set 2. However, error was less at the lower score-threshold of 0.15 for the correlated test stimulus than for the uncorrelated test stimulus, t(8) = -3.421, p < .05 for familiarization set 1, and t(8) = -8.711, p < .05 for familiarization set 2. Assuming that the higher score-threshold represents 4-month-olds and the lower score-threshold represents 10-month-olds, this reproduces the interaction pattern found with infants. The fact that this interaction between age and test stimuli occurs in a simulation of Younger and Cohen’s (1986) experiment 3 indicates support for the hypothesis that the effect is due to differential correlation detection rather than differential similarity between test and familiarization stimuli. ------------------------------Insert Figure 3 about here ------------------------------One of the advantages of computer modeling is the ability to generate specific predictions. We ran the same simulation using nine additional networks trained on each Infant Category Learning 14 of the two familiarization sets with an even higher score-threshold of 0.5. Mean network error is plotted in Figure 4, along with the means for the lower score-thresholds already reported. Recall that these simulations were conducted as in Younger and Cohen’s (1986) Experiment 3, which was designed to test whether the age x test-stimulus interaction was a correlation effect or a similarity effect. Raising score-threshold to the higher value of 0.5 produces more error to the correlated stimulus than to the uncorrelated stimulus, t(8) = 3.250, p < .05 for familiarization set 1, and t(8) = 5.880, p < .05 for familiarization set 2. At this higher level of score-threshold, the results are more in line with a similarity effect, i.e., more error to (or interest in) the correlated test item than to the uncorrelated test item. Recall that it was the uncorrelated test item that was most similar to those in the familiarization set of Experiment 3. Thus, the simulations generate a crossover prediction, with deep learning (low score-threshold) showing a correlation effect, and more superficial learning showing a similarity effect (high score-threshold). ------------------------------Insert Figure 4 about here ------------------------------Another advantage of computer simulations is to be able to examine the emergence of these differences over time, in this case over the entire familiarization phase. With infants, one could examine a decrease in attention to the familiarization stimuli, but not attention to the test stimuli without actually affecting attention to the test stimuli. Essentially, it is impossible to prevent infants from learning something about the test stimuli. However, this can be done with artificial neural networks because we can entirely exclude test stimuli from the training set. Infant Category Learning 15 Results from two representative networks, one from each age condition, are presented in Figures 5 and 6. Figure 5 shows mean error to familiarization and test stimuli for a network run at the lower score-threshold of 0.15 plotted over output epochs. Because error does not change during input phases, there is no need to include inputphase epochs in such plots. The exponential decrease in training error is typical of both habituation in infants and cascade-correlation simulations of that habituation (Shultz & Bale, 2001). The particular epochs at which hidden units were recruited are indicated by triangles along the bottom of the graph. It is typical for such recruitments to produce subsequent sharp declines in network error. Comparison of the decreasing error to the two test patterns shows an early similarity effect in which error is higher to the correlated test item than to the uncorrelated test item (epochs 5-20) followed by an eventual preference for the uncorrelated test item in the final few epochs, indicating the correlation-detection effect shown in Figures 3 and 4. ------------------------------Insert Figure 5 about here ------------------------------Figure 6 shows mean error to familiarization and test stimuli for a network run at the higher score-threshold of 0.25, again plotted over output epochs. An exponential decrease in training error is again evident here, with sharper declines after hidden unit recruitment. Between epochs 5 and 13, there is a temporary preference for the correlated test item. But instead of an eventual reversal to a preference for the uncorrelated test item as seen in networks run at a score-threshold of 0.15, learning ends with equal preferences for these two test items. That is, an early similarity effect gives way to an eventual lack of Infant Category Learning 16 difference. This lack of difference between correlated and uncorrelated test items is characteristic of superficially trained networks shown in Figures 3 and 4. We discuss ways of testing these network predictions and some initial empirical data on those predictions in our final Discussion section. ------------------------------Insert Figure 6 about here ------------------------------Discrimination of a Correlated Test Stimulus: To simulate infants' ability to discriminate the correlated test stimulus from the familiarization set, we ran 20 cascade-correlation networks in each of the four conditions portrayed in Table 2. Score-threshold was left at the default value of 0.4, and output and input patience parameters were set to 1. Results are presented in Table 4, where conditions a-d correspond to those in Table 2. In every condition, there was significantly more error to the correlated novel test stimulus than to the familiar stimulus, showing that the networks clearly discriminated the two test stimuli. This effect is completely unsurprising for artificial neural networks, which normally find less error on stimuli they have been trained on. Indeed, this effect is considerably stronger in networks than in infants, but the important thing is that the effect is in the proper direction and statistically reliable. -----------------------------Insert Table 4 about here -----------------------------Preference for Novel Feature Values Infant Category Learning 17 It was noted in many of the Younger and Cohen infant experiments that the test stimuli with novel feature values attracted more attention than the test stimuli with familiar feature values. This was also true in our cascade-correlation simulations. Table 5 presents the mean network error to the test stimulus with novel features in our first simulation (of Experiment 3 from Younger & Cohen, 1986). Comparison to Figures 3 and 4 indicate that these values are about three times higher than the corresponding values for test stimuli with familiar features. Although this is too large an effect to closely mimic the infant data, the important consideration once again is that it is in the right direction. It is well to remember that the novel-values condition is just a control condition, and that it is somewhat arbitrary how we code these novel values in simulations. We actually coded them as being sharply different from the familiar feature values, but perhaps infants see them as only slightly different. Also, there may be a natural limit to attention in the test phase based on exposure time, and this was ignored in our simulations in that stimulus error could range above such a limit. Comparative Simulations with Static Back-Propagation Networks To investigate the range of artificial neural network algorithms that could capture these infant data, we tried comparable simulations using static back-propagation networks, arguably the most popular of neural networks. Again, the most important phenomena to cover in this area are those in our first simulation, on the age x teststimulus interaction found by Younger and Cohen (1986, Experiments 2 and 3). Recall that their 4-month-old infants recovered attention more to test stimuli with novel feature values than to test stimuli with familiar feature values, showing no difference between Infant Category Learning 18 correlated and uncorrelated test stimuli. In contrast, 10-month-olds showed greater recovery to both novel-feature-value test items and to uncorrelated test items than to correlated test items. This suggested that the 4-month-olds had learned about the stimulus features and that 10-month-olds had additionally learned about the correlations between features. Cascade-correlation networks covered this interaction when score-threshold values governed how deeply the familiarization stimuli were learned. At a loose scorethreshold of 0.25, cascade-correlation networks performed like 4-month-olds, showing no difference between correlated and uncorrelated test items, and more error to items with novel feature values. At a stricter score-threshold of 0.15, the networks performed like 10-month-olds, showing less error to novel and correlated test items than to uncorrelated test items. We noted that cascade-correlation networks, at both levels of score-threshold, recruited about three hidden units during the first, familiarization phase, and took about 55-60 epochs to learn in that phase. We investigated whether static back-propagation encoder networks, similarly equipped with three hidden units, could cover these key interactions between scorethreshold (implementing age differences) and test stimulus. Our back-propagation simulator was constructed to use a score-threshold parameter to decide on learning success, just as in the cascade-correlation algorithm. This permits a more direct comparison of cascade-correlation networks to back-propagation networks, which more typically stop learning when a particular error criterion is reached. We varied the network topology to explore the possible roles of network depth and the presence of cross connections in simulation success. The binary coding scheme that we used in the cascade-correlation simulations was also used here. Infant Category Learning 19 In all back-propagation simulations, we used the default parameter settings of 0.5 for learning rate and 0.9 for momentum. The learning-rate parameter scales the sizes of weight changes and is generally set to a moderate value to allow reasonably fast learning, but not so fast that weight changes oscillate wildly over more optimal values. The momentum parameter gives weights a relative degree of inertia or momentum, so that they will change less when the last change was small and change more when the last change was large. The idea is to induce larger weight changes when the weight is far from the minimum error, and small weight changes when the weight is closing in on the minimum error. Nine networks were run in each condition, and a wide variety of scorethreshold values were explored in a systematic attempt to cover the infant data. We varied score-thresholds in the range of 0.05 to 0.50 in nine steps of 0.05 in order to sample a wide range of depth of learning. Thus, there were 9 networks at each of 10 score-threshold values for each of the 2 familiarization sets, yielding 180 networks. Networks ran for a maximum of 300 epochs. Because the most standard back-propagation networks place all hidden units on a single layer without either cascaded weights between hidden units or cross connections bypassing hidden unit layers, that is what we tried first. In addition to the bias input unit, which is always on, there were six input units coding the stimuli, three hidden units on the second layer, and six output units on which the network registered its response. All of our back-propagation results are reported only briefly because they were unsuccessful. These 6-3-6 networks had difficulty detecting correlations between stimulus features. Again, the signature of such correlation detection would be higher error on Infant Category Learning 20 uncorrelated test items than on correlated test items. Next we tried a deeper static backpropagation network to see whether cascade-correlation’s advantage of greater network depth is sufficient to cover the infant data. These 6-1-1-1-6 networks produced about the same results as the flat 6-3-6 networks did. If there was any difference at all, there was slightly higher error to correlated than to uncorrelated test stimuli (a similarity effect), but it was rare to find any significant difference in t-tests comparing error on the correlated versus uncorrelated test stimuli. Finally, we tried adding the presence of cross-connections to both the 6-3-6 and the 6-1-1-1-6 networks, always eliminating direct input-to-output connections as we did in the cascade-correlation simulations. In both cases, the results were the same as before. No comparison ever revealed lower error on correlated than on uncorrelated test items, the signature of effective correlation detection. Thus, back-propagation networks were unable to capture the results of either 4month-olds or 10-month-olds in this critical experiment, showing neither a consistent similarity effect nor any correlation effect at all. Both of these effects, and the developmental transition between them seems to require a network that is capable of growing while learning, namely cascade-correlation. Although it is difficult to claim that an algorithm such as back-propagation cannot in principle cover a set of phenomena, it is clear that we gave this algorithm a fair chance, running nine networks in each of 80 simulations (2 familiarization sets x 10 score-threshold levels x 4 network topologies). Discussion In our cascade-correlation simulations, when networks learned deeply, aided by a low score-threshold, they showed more error to uncorrelated than to correlated test Infant Category Learning 21 patterns, indicating that they abstracted correlations during habituation. When prevented from learning deeply by a higher score-threshold, they decreased error during familiarization, and showed as much error to correlated as to uncorrelated tests, but less than to novel tests, indicating that they learned features but not correlations among features. Our explanation of the psychological data, based on simulations, is that older infants learn more from the same exposure time than do younger infants. Unlike previous explanations that postulate unspecified qualitative shifts in processing with age, our explanation focuses on quantitatively deeper learning with increasing age. We also generated a crossover prediction, with deep learning showing a correlation effect, and even more superficial learning showing a similarity effect. This simulation predicts, for example, that with a single, suitable age group, say 10-montholds, there would be a correlation effect under optimal learning conditions and a similarity effect with less than optimal familiarization learning. When networks were repeatedly tested over the familiarization phase, those with a lower score-threshold (deeper learning) showed an early similarity effect followed by the eventual correlation effect. In contrast, networks with a higher score-threshold (shallower learning) showed an early similarity effect followed by an eventual non-difference between correlated and uncorrelated test stimuli. If infants of different ages could be given occasional tests during the familiarization phase, these predictions too could be tested. In fact, we have just begun to run one such study. We (Cohen & Arthur, unpublished) also have just completed an experiment with 10-month-old infants in an attempt to replicate the results reported by Younger and Cohen (1986, Experiment 3). The stimuli were the same animals used by Younger and Infant Category Learning 22 Cohen. But, instead of giving all infants 9, 20 s familiarization trials as in the original study, we attempted to habituate the infants to a stringent criterion (looking time dropping to 50% of the average of the first 3 trials). Following habituation criterion (or a maximum of 18 trials, if the infants didn't habituate) all infants were tested with the correlated, uncorrelated, and novel test stimuli. Forty-four infants were tested in all. Thirty-two of them habituated, 12 did not. The results, divided by habituators versus non-habituators, are shown in Figure 7. As one can see in the figure, the habituators looked longer at the uncorrelated than the correlated test stimulus, but the nonhabituators did just the opposite. They looked longer at the correlated than the uncorrelated test stimulus. This interaction was significant, F (1, 42) = 5.298, p < .03. Thus, we have the first psychological evidence of a shallow (similarity) versus deep (correlation) type of learning predicted by the model. It will be interesting to see if this difference is merely a function of time needed to learn or is a difference in learning style (i.e., shallow versus deep). One can also see from Figure 7 that both groups of infants looked longest at the novel test stimulus. This finding was also predicted by our model. Finally, the fact that even the nonhabituators looked longest at the novel test stimulus indicates that their longer looking to the correlated than noncorrelated test items cannot be explained by a simple familiarity preference (Hunter & Ames, 1988). ------------------------------Insert Figure 7 about here ------------------------------Infant Category Learning 23 In addition to covering the essential age x correlation interaction and the correlation interpretation of this effect (with sufficiently deep learning), our cascadecorrelation networks covered two other features of the infant data: discrimination of the habituated from the correlated test stimulus via the presence of two additional noncorrelated features, and more recovery to test stimuli with novel features than to test stimuli with familiar features. In contrast, comparable simulations with static back-propagation networks did not cover the infant data despite wide variation in designed topologies and parameter values. Although these back-propagation simulations were not successful in capturing the infant data, they were useful in establishing why cascade-correlation networks were successful. Namely, the growth of cascaded networks with cross connections seems critical to capturing both similarity and correlation effects and a developmental shift between them. A number of similar head-to-head competitions between cascade-correlation and static back-propagation networks have produced similar results on developmental phenomena in favor of cascade-correlation (Buckingham & Shultz, 1996; Shultz, Mareschal, & Schmidt, 1994). It may yet be possible to simulate these infant data with static networks using techniques we have not thought of, but probably something would have to change, either internal or external to the network, to implement the documented infant age differences. Our cascade-correlation simulations show that deep learning allows detection of correlations between features, whereas shallower learning only allows learning about the features themselves. As noted, the corresponding psychological assumption is that older infants learn more from same stimulus exposure time than do younger infants. This is a Infant Category Learning 24 new interpretation of the infant data because previous explanations postulated unspecified qualitative shifts in processing with age. Our simulation suggests that the developmental shift from learning about features to learning about correlations among features could instead be due to quantitative increases in depth of learning. In networks, deeper learning is typically characterized by less error and sharper knowledge representations (Sirois & Shultz, 1998). Although there is evidence that older children learn more from identical experiences than younger children do (Case, Kurland, & Goldberg, 1982), the basis for this phenomenon is still unknown. Ultimately, the most useful models in this area will be ones that cover the existing developmental data, lead to new predictions that can be confirmed with infants, and explain the basis for developmental changes in learning. Our present model covers the developmental shift from learning about features to learning about correlations among features. The model also predicts a short-term shift of this sort during a single familiarization session, evidence for which is beginning to emerge. Further, the model suggests that the basis for both of these shifts is a quantitative increase in depth of learning. Just how and why depth of learning changes in the development of infant categorization is an important topic that requires further investigation. Infant Category Learning 25
منابع مشابه
Distributional Learning of Vowel Categories Is Supported by Prosody in Infant-Directed Speech
Infants’ acquisition of phonetic categories involves a distributional learning mechanism that operates on acoustic dimensions of the input. However, natural infant-directed speech shows large degrees of phonetic variability, and the resulting overlap between categories suggests that category learning based on distributional clustering may not be feasible without constraints on the learning proc...
متن کاملNeural markers of subordinate-level categorization in 6- to 7-month-old infants.
Subordinate-level category-learning processes in infants were investigated with ERP and looking-time measures. ERPs were recorded while 6- to 7-month-olds were presented with Saint Bernard images during familiarization, followed by novel Saint Bernards interspersed with Beagles during test. In addition, infant looking times were measured during a paired-preference test (novel Saint Bernard vs. ...
متن کاملCategorization in infancy.
Human infants display complex categoriztion abilities. Results from studies of visual preference, object examination, conditioned leg-kicking, sequential touching, and generalized imitation reveal different patterns of category formation, with different levels of exclusivity in the category representations formed by infants at different ages. We suggest that differences in levels of exclusivity...
متن کاملRunning head : AGE EF AND AUD CAT LEARNING 1
Auditory categorization is a natural and adaptive process that allows for the organization of high-dimensional, continuous acoustic information into discrete representations. Studies in the visual domain have identified a rule-based learning system that learns and reasons via a hypothesis-testing process that requires working memory and executive attention. The rule-based learning system in vis...
متن کاملRunning Head : Labels , Motion , and Attention Linguistic Labels , Dynamic Visual Features , and Attention in Infant Category
How do words affect categorization? According to some accounts, even early in development, words are category markers and are different from other features. According to other accounts, early in development, words are part of the input and are akin to other features. The current study addressed this issue by examining the role of words and dynamic visual features in category learning in 8to 12m...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003